Skip to content

Conversation

@pymia
Copy link
Contributor

@pymia pymia commented Sep 23, 2025

Implements SageMaker Serverless Inference endpoints as requested in issue #23148.

  • Add ServerlessProductionVariantProps interface with maxConcurrency, memorySizeInMB, and provisionedConcurrency
  • Extend EndpointConfig to support serverless variants alongside existing instance variants
  • Add comprehensive validation for serverless configuration parameters
  • Enforce mutual exclusivity between instance and serverless variants
  • Add CloudFormation template generation for ServerlessConfig properties
  • Include extensive test coverage for validation scenarios and error cases

Issue # 23148

Closes #23148.

Reason for this change

AWS SageMaker Serverless Inference is not supported in the CDK SageMaker L2 constructs. Users can only configure instance-based endpoints, missing the serverless option for intermittent/unpredictable traffic patterns that could benefit from cost-effective serverless inference.

This feature was explicitly planned in the original SageMaker Endpoint L2 construct RFC with Instance-prefixed classes designed to make room for Serverless-prefixed analogs.

Description of changes

Implements AWS SageMaker Serverless Inference support in CDK SageMaker L2 constructs, enabling cost-effective serverless endpoints for intermittent workloads:

  • New ServerlessProductionVariantProps interface extending ProductionVariantProps with AWS-compliant serverless properties:
    • maxConcurrency: 1-200 range (required)
    • memorySizeInMB: 1024-6144MB in 1GB increments (required)
    • provisionedConcurrency: 1-200 range, optional, must be ≤ maxConcurrency
  • New addServerlessProductionVariant() method with comprehensive input validation
  • Extended EndpointConfigProps with optional serverlessProductionVariant property
  • Mutual exclusivity enforcement between instance and serverless variants per AWS constraints
  • Single serverless variant limit per endpoint configuration (AWS limitation)
  • Comprehensive synthesis-time validation with clear, actionable error messages
  • CloudFormation integration leveraging existing L1 construct ServerlessConfig support

Usage Example:

import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';

declare const model: sagemaker.IModel;

// Create serverless endpoint configuration
const endpointConfig = new sagemaker.EndpointConfig(this, 'ServerlessEndpointConfig', {
  serverlessProductionVariant: {
    model: model,
    variantName: 'serverlessVariant',
    maxConcurrency: 10,
    memorySizeInMB: 2048,
    provisionedConcurrency: 5, // optional
  },
});

Describe any new or updated permissions being added

N/A - No new IAM permissions required. Leverages existing SageMaker model and endpoint permissions.

Description of how you validated changes

  • Unit tests: Added 12 comprehensive serverless variant tests covering all validation scenarios:

    • Memory size validation (1024-6144MB in 1GB increments)
    • Concurrency range validation (1-200 for both max and provisioned)
    • Mutual exclusivity enforcement between instance and serverless variants
    • Single serverless variant limit per AWS constraints
    • Cross-environment model compatibility validation
    • Error condition testing with clear error messages
    • CloudFormation template generation verification
  • Integration tests: Extended existing integration test with serverless endpoint configuration, verified CloudFormation template generation with correct ServerlessConfig properties:

    ServerlessEndpointConfig:
      Type: AWS::SageMaker::EndpointConfig
      Properties:
        ProductionVariants:
          - ServerlessConfig:
              MaxConcurrency: 10
              MemorySizeInMB: 2048
              ProvisionedConcurrency: 5
            VariantName: serverlessVariant
  • Comprehensive testing results: 63/63 unit tests pass (100% success rate), 4/4 integration tests pass, no regressions detected across 16,024+ CDK tests

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p1 beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK labels Sep 23, 2025
@aws-cdk-automation aws-cdk-automation requested a review from a team September 23, 2025 08:29
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This review is outdated)

@pymia pymia force-pushed the feature/sagemaker-serverless-variants-23148 branch 2 times, most recently from aad0c97 to 78ef21c Compare September 23, 2025 13:31
@pahud pahud marked this pull request as draft September 23, 2025 14:23
@pahud pahud self-assigned this Sep 23, 2025
@pahud
Copy link
Contributor

pahud commented Sep 23, 2025

taking a look.

@pahud
Copy link
Contributor

pahud commented Sep 23, 2025

❌ Features must contain a change to a README file.
❌ Features must contain a change to an integration test file and the resulting snapshot.

As this is a new feat we need

  1. update README with very focusd and minimal description.
  2. add new intet test or refresh existing relevant integ tests and update snapshots

@aws-cdk-automation aws-cdk-automation dismissed their stale review September 23, 2025 15:05

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

@pymia pymia marked this pull request as ready for review September 23, 2025 15:37
@pahud pahud marked this pull request as draft September 23, 2025 15:37
@pymia pymia force-pushed the feature/sagemaker-serverless-variants-23148 branch from 04fc444 to 5ff7875 Compare September 24, 2025 15:31
@pymia pymia force-pushed the feature/sagemaker-serverless-variants-23148 branch 4 times, most recently from 2ab372b to d8a868d Compare September 29, 2025 14:58
@pahud pahud removed their assignment Sep 29, 2025
@pahud pahud marked this pull request as ready for review September 29, 2025 16:29
@abidhasan-aws abidhasan-aws self-requested a review September 30, 2025 12:52
@abidhasan-aws abidhasan-aws self-assigned this Sep 30, 2025
@abidhasan-aws abidhasan-aws removed their request for review September 30, 2025 13:40
@abidhasan-aws abidhasan-aws removed their assignment Sep 30, 2025
@abidhasan-aws abidhasan-aws self-requested a review September 30, 2025 14:49
@abidhasan-aws abidhasan-aws self-assigned this Sep 30, 2025
@pymia pymia had a problem deploying to deployment-integ-test November 11, 2025 13:01 — with GitHub Actions Error
pymia and others added 7 commits November 11, 2025 14:16
Implements SageMaker Serverless Inference endpoints as requested in issue aws#23148.

- Add ServerlessProductionVariantProps interface with maxConcurrency, memorySizeInMB, and provisionedConcurrency
- Extend EndpointConfig to support serverless variants alongside existing instance variants
- Add comprehensive validation for serverless configuration parameters
- Enforce mutual exclusivity between instance and serverless variants
- Add CloudFormation template generation for ServerlessConfig properties
- Include extensive test coverage for validation scenarios and error cases

Closes aws#23148
…less inference

- Add comprehensive serverless inference documentation to SageMaker alpha README
- Update integration test with serverless endpoint configuration examples
- Include verification comments for both instance-based and serverless endpoints
- Generate CloudFormation snapshots with proper ServerlessConfig properties

Addresses reviewer feedback requiring README documentation and integration test coverage for the new serverless inference feature.
…ch AWS specs

- Update maxConcurrency validation range from 1-200 to 1-1000

- Update provisionedConcurrency validation range from 1-200 to 1-1000

- Fix memory size documentation from 3008MB to 3072MB in requirements

- Add comprehensive test coverage for upper bound validation

- Update TypeScript definitions and JSDoc comments

This aligns the implementation with AWS SageMaker serverless endpoint specifications and RFC 431 requirements for L2 constructs.
- Remove standalone readme_serverless_section.md file
- Remove enhanced_integ_test.ts file
- Consolidate serverless tests into existing integ.endpoint-config.ts
- Add comprehensive serverless test cases (minimal, full, boundary values)
- Maintain existing documentation in main SageMaker README
- Keep mutual exclusivity validation with AWS docs justification

Addresses review comments in PR aws#35557
- Remove redundant hasInstanceVariants variable in validation logic
- Add SageMaker Serverless Inference documentation link to README
- Throw error in renderServerlessProductionVariant when undefined
- Add validation in renderInstanceProductionVariants for empty variants
- Add comprehensive API assertions to integration tests
- Update integration test snapshots

Addresses review comments from abidhasan-aws in PR aws#35557
@pymia pymia force-pushed the feature/sagemaker-serverless-variants-23148 branch from a1c312d to f8571d9 Compare November 11, 2025 13:16
@pymia pymia temporarily deployed to deployment-integ-test November 11, 2025 13:17 — with GitHub Actions Inactive
@pymia pymia deployed to deployment-integ-test November 11, 2025 14:55 — with GitHub Actions Active
abidhasan-aws
abidhasan-aws previously approved these changes Nov 11, 2025
@mergify
Copy link
Contributor

mergify bot commented Nov 11, 2025

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot had a problem deploying to deployment-integ-test November 11, 2025 16:38 Failure
@mergify
Copy link
Contributor

mergify bot commented Nov 11, 2025

This pull request has been removed from the queue for the following reason: checks failed.

The merge conditions cannot be satisfied due to failing checks:

You may have to fix your CI before adding the pull request to the queue again.
If you update this pull request, to fix the CI, it will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio requeue comment.

@abidhasan-aws abidhasan-aws added the pr/do-not-merge This PR should not be merged at this time. label Nov 11, 2025
@abidhasan-aws abidhasan-aws removed pr/do-not-merge This PR should not be merged at this time. pr/needs-integration-tests-deployment Requires the PR to deploy the integration test snapshots. labels Nov 12, 2025
@mergify mergify bot dismissed abidhasan-aws’s stale review November 12, 2025 09:28

Pull request has been modified.

@mergify
Copy link
Contributor

mergify bot commented Nov 12, 2025

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify
Copy link
Contributor

mergify bot commented Nov 12, 2025

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit 3f5c5ac into aws:main Nov 12, 2025
19 of 20 checks passed
@github-actions
Copy link
Contributor

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sagemaker: Support serverless variants for endpoints

4 participants